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Microarray analysis reveals a major direct role of 
DNA copy number alteration in the transcriptional 
program of human breast tumors 
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Genomic DMA copy number alterations are key genetic events in 
the development and progression of human cancers. Here we 
report a genontewfde mlo-carray comparative genomic hybrid- 
ization (array CGH) analysis of DNA copy number variation In 
a series of primary human breast tumors. We have profiled DMA 
copy number alteration across 6,691 mapped human genes, in 44 
predominantly advanced, primary breast tumors and 10 breast 
cancer cell lines. While the overall patterns of DNA amplification 
and deletion corroborate previous cytogenetic studies, the high- 
resolution (gene-by-gene) mapping of ampltcon boundaries and 
the quantitative analysis of ampUcon shape provide significant 
Improvement In the localization of candidate oncogenes. Parallel 
microarray measurements, of mRNA levels reveal the remarkable 
degree to which variation in gene copy number contributes to 
variation In gene expression In tumor cells. Specifically, we find 
that 61% of highly amplified genes show moderately or highly 
afevated expresslpiu that ftWA.co^y. number, -InfluenoM uswa ex- 
pression across a wide range of DNA copy number alterations 
(deletion, low-, mid- and high-level amplification^ thaton average, 
a 2-foW change In DNA copy number Is associated with a corre- 
sponding 1 .5-fold change In mRNA levels, and that overall, at least 
12% of ail the variation In gene expression among the breast 
tumors is directly attributable to underlying variation In gene copy 
number. These findings provide evidence that widespread DNA 
copy number alteration can lead directly to global deregulation of 
gene expression, which may contribute to the development or 
progression of cancer, 

Conventional cytogenetic techniques, including comparative 
genomic hybridization (CGH) (1), have led to the TdcntJfi- 
cation of a number of recurrent regions of DNA copy number 
alteration in breast cancer celhlines and tumors (2-4). While 
some of these regions contain known or candidate oncogenes 

fe?^ 0FR l (8pU >' MYC t^ 24 )' CO™ < n 1 13 ). ERBB2 
tt?£} 2 )j and ZNF217 (20qU)J and tumor suppressor genes 
[RBI (13qt4) and TP53 <17pl3)J, the relevant genefs) within 
other regions (eg,, gain of lq, 6q22, and 17q22-24, and loss of 
8p) remain to be identified. A higji-resolution genome-wide 
map, delineating the boundaries of DNA copy number alter- 
atioos in tumors, should facilitate the localization and identifi- 
cation of oncogenes and tumor suppressor genes in breast 
cancer. In this study, we have created such a map, using 
array-based CGH (5-7) to profile DNA copy number alteration 
In a series of breast cancer coll lines and primary tumors. 

An unresolved question is the extent to which the widespread 
DNA copy number changes that we and others have identified 
in breast tumors alter expression of genes within involved 
regions. Because we had measured mRNA levels in parallel in 
the same samples (8), using the same DNA microarrays, we had 
an opportunity to explore on a genomic scale the relationship 
between DNA copy number changes and gene expression; From 



this analysis, we have identified a significant impact of wide- 
spread DNA copy number alteration on the transcriptional 
programs of breast tumors. 

Materials and Methods 

Tumors and CeH lines, Primary breast tumors were predominantly 
large (>3 cm), intermedwt^grade, infiltrating ductal carcino- 
ma^ vithmore than' 50% Ireing lymph node positive. The 
fraction of tumor cells within specimens averaged at least 50% 
Details of individual tumors haye been published (8, % and 
are summarized in Table 1, which is published as supporting 
information on the PNAS web site, ww.pnas^rg. Breast cancer 
cell lines were obtained from the American Type Culture 
Coliectioii. Genomic DNA was isolated either using Qiagen 
genomic DNA columns, or by phenol/chloroform extraction 
followed by ethanol precipitation 

DNA Labeling and Microarray Hybridizations. Genomic DNA label- 
ing and hybridizations were performed essentially as described 
in Pollack el ol (7), with slight modifications. Two micrograms 
of DNA was labeled In a total volume of 50 microliters and the 
volumes of all reagents were adjusted accordingly. "Test" DNA 
(from tumors and cell lines) was f luorescently labeled (Cy5) and 
hybridized to a human cDNA microarray containing 6,691 
different mapped human genes (Le., UniGene cluster*). The 
"reference* (labeled with Q6) for each hybridization was nor- 
mal female leukocyte DNA from a single donor. The fabrication 
of cDNA microarrays and the labeling and hybridization of 
mRNA samples have been described (6). 

Data Analysis and Map Positions. Hybridized arrays were scanned 
on a GenePix#»nncr (Axon Instruments, Foster Ory, CA), and 
fluorescence ratios (test/reference) calculated using scanalyze 
software (available at http://ranaJbl.gov). Fluorescence ratios 
were normalized for each array by setting the average log 
fluorescence ratio for all array elements equal to 0. Measure- 
ments with fluorescence intensities more than 20% above back- 
ground were considered reliable. DNA copy number profiles 
that deviated significantly from background ratios measured in 
normal genomicDNA control hybridizations were interpreted as 
evidence of real DNA copy number alteration (see Estimating 
Significance of Altered Fluorescence Ratios in the supporting 
information). When indicated, DNA copy number profiles are 
displayed as a moving average (symmetric 5-nearest neighbors), 
Map positions for arrayed human cDNAs were assigned by 
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Rg.1. Genofne^demeasureirosm^ 

. numbers of X chromosomes, for breast cancer cell lines* and for breast tumors. Each row represents a different cell Itne or tumor, and each column represents 
oneof 6,691 different mapped human genesprese^ 

S-nearest neighbors) fluorescence ratios (test/reference) are depicted using a logrbased pseudocolor scale (Indicated), such that red luminescence reflects 
fold-amplification, green luminescence reflects foW-deletton, and black Indicates no change (gray Indicates poorly measured data), (b) Enlarged view of DN A 
copy number profiles across the X chromosome, shown for cell lines containing different numbers of X chromosomes. 



identifying the starting position of the best and longest match of 
any DN A sequence represented in the corresponding UniGene 
cluster (10) against the "Golden Path" genome assembly 
(http://genome.ucsc.edu/; Oct 7, 2000 Freeze). For UniGene 
clusters represented by muliip^airayed elements, mean fluo- 
rescence ratios (for all elements representing the same UniGene 
cluster) are reported. For mRNA measurements, fluorescence 
ratios are "mean-centered" (ie„ reported relative to the mean 
ratio across the 44 -tumor samples). The data set described here 
can be accessed in its entirety m the supporting information. 

Results 

We performed CGH on 44 predominantly locally advanced, 
primary breast tumors and 10 breast cancer cell lines, using 
cDNA microarrays containing 6,691 different mapped human 
genes (Fig. In; also see Materials and Methods for details of 
microarray tjybrldizations). To take full advantage of the im- 
proved spatial resolution of array CGH, we ordered (fluores- 
cence ratios for) the 6,691 cDNAs according to the "Golden 
Path" (http://genomc.ucsc.edu/) genome assembly of the draft 
human genome sequences (11). In so doing, arrayed cDNAs not 
only themselves represent genes of potential interest (e.g., 
candidate oncogenes within amplicons), but also provide precise 
genetic landmarks for chromosomal regions of amplification and 



deletion. Parallel analysis of DNA from cell lines containing 
different numbers of X chromosomes (Fig. 16), as we did before 
(7), demonstrated the sensitivity of our method to detect single- 
copy loss (45, XO), and 13- (47,XXX), 2- (48\XXXX), or 
Z5-fotd (49PCXXXX) gains (also see Fig. 5, which is published 
as supporting information on the PNAS web site). Fluorescence 
.ratios were linearly proportional to copy number ratios, which 
were slightly underestimated, in agreement with previous ob- 
servations (7). Numerous DNA copy number alterations were 
evident in both the breast cancer cell lines and primary tumors 
(Fig. la), detected in the tumors despite the presence of euploid 
non-tumor cell types; the magnitudes of the observed changes 
were generally lower in the tumor samples. DNA copy-number 
alterations were found in every cancer cell line and tumor, and 
on every human chromosome in at least one sample. Recurrent 
regions of DNA copy number gain and loss- were readily iden- 
tifiable. For* example,. gains within Iq, 8q, 17q, and 20q were 
observed in a high proportion of breast cancer cell lines/tumors 
(90%/69%, 100%/47%, 100%/60%, and 90%/44%, respective- 
ry), as were losses within lp, 3p, 8p, and 13q (8095/24%, 
80%/22%, 80%/22%, and 70%/lS%, lespectrvery), consistent 
with published cytogenetic studies (rets. 2-4; a complete listing 
of gains/losses is provided in Tables 2 and 3, which are published 
as supporting information on the PNAS web site). The total 
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ftg.2. DNA copy number alteration across chromosome 8 by afT3yCGH. (a) ON A copy number profiles a re {I [urtratedf or cell lines containing different numbers 
of X chromosomes, -for breast cancer cell lines, and for breast tumors. Breast cancer cell lines and tumors are separately ordered fay hierarchical clustering to 
highlight recurrent copy number changes. The 241 genes present on the micro arrays and mapping to chromosome 8 are ordered by position along the 
chromosome. Fluorescence ratios (test/reference) are depicted by a togj pseudocolor scale (Indicated). Selected genes are Indicated wfth color-coded text {red, 
Increased: green, decreased; blade, no change; gray, not well measured) to reflect correspondingly altered mRNA levels (observed m the majority of the subset 
of samples displaying the DMA copy number change). The map positions for genes of Interest that are not represented on the mtaoarray are Indicated In the 
row above those genes represented on the array, (b) Graphical display of DNA copy number profile for breast cancer ceil line SKBR3. Fluorescence ratios 
gumor/normal) are plotted on a log? sjc^^ ^ ' 



number of genomic alterations (gains and losses) was found to 
be significantly higher in breast tumors that were high grade (P ^ 
0.008), consistent with published CGH data (3), estrogen recep- 
tor negative {P « 0,04), and harboring TP53 mutations (P = 
0.0006) (see Table 4, which is published as supporting informa- 
tion on the PNAS web site). 

The improved spatial resolution of our array 'CGH analysis is 
illustrated tor chromosome 8, which displayed extensive DNA 
copy number alteration in our series. A detailed view of the 
variation in the copy number of 241 genes mapping to chromo- 
some 8 revealed multiple regions of recurrent amplification; 
each of these potentially harbors a different known or previously 
uncharacterired oncogene (Fig. 2a). The complexity of amplicdn 
structure is most easily appreciated in the breast cancer cell line 
SKBR3. Although a conventional CGH analysis of 8q in SKBR3 
identified only two distinct regions of amplification (12), we 
observed three distinct regions of high-level amplification (la- 
beled 1-3 in Fig. 26). For each of these regions we can define the 



boundaries of the interval recurrently amplified an the tumors we 
examined; in each case, known or plausible candidate oncogenes 
can be identified (a description of these regions, as well as the 
recurrently amplified regions on chromosomes 17 and 20, can be 
found in Figs. 6 and 7, which are published as supporting 
information on the PNAS web site). 

For a subset of breast cancer cell lines and tumors (4 and 37, 
respectively), and a subset of arrayed genes (6,095), mRNA 
levels were quantitatively measured in parallel by using cDNA 
microarrays (8). The parallel assessment of mRNA levels is 
useful in the interpretation of DNA copy number changes. For 
example, the highly amplified genes that are also highly ex* 
pressed are the strongest candidate oncogenes within an ampli- 
con. Perhaps more significantly, our parallel analysis of DNA 
copy number changes and mRNA levels provides us the oppor- 
tunity to assess the global impact of widespread DNA copy 
number alteration on gene expression in tumor cells. 

A strong influence of DNA copy number on gene expression 
Is evident m an examination of the pseudocolor representations 
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Bg. 3, Concordance between DNA copy number and gene expression across chromosome 17. DNA copy number alteration (Upper) and mRNA levels (Cowe/) 
are Illustrated iof breast cancer cell lines and tumors. Bceast cancer cell Itnes and tumors are separately ordered by hierarchical clustering (Upper), end the 
Identical sample order Is maintained (Lower). The 354 genes present on the mlcroarrays and mapping to chromosome 1 7. and for whkh both DNA copy number 
and mRNA levels were determined, are ordered by position along the chromosome; selected genes are Indicated In color-coded text (fee Rg. 2 legend). 
Fluorescence ratios (test/reference) are depicted by separate log* pseudocolor scales (Indicated). 
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of DNA copy number and mRNA levels for genes on chromo- 
some 17 (Fig. 3). The overall patterns of gene amplification and 
elevated gene expression are quite concordant; Le., a significant 
fraction of highly amplified genes appea* to he correspondingly 
highly expressed. The concordance between high-level amplifi- 
cation and increased gene expression is* not restricted to chro- 
mosome 17. Genome-wide, of 117 high-level DNA amplifica- 
tions (fluorescence ratios >4, and representing 91 different 
genes), 62% (representing 54 different genes; see Table 5, which 
is published as supporting information on the PNAS web site) 
are found associated with at least moderately elevated mRNA 
levels (mean-centered fluorescence ratios >2), and 42% (rep- 
resenting 36 different genes) are found associated with compa- 
rably highly elevated mRNA levels (mean-centered fluorescence 
ratios >4). 

To determine the extent to which DNA deletion and lower- 
level amplification (in addition to high-level amplification) are 
also associated with corresponding alterations in mRNA levels, 
we performed three separate analyses on the complete data set 
(4 cell lines and 37 tumors, across 6,095 genes). First, we 
determined the average mRNA levels for each of five classes 
of genes, representing DNA deletion, no change, and low-, 
medium-, and high-level amplification (Fig. 4a). For both the 



breast cancer cell lines and tumors, average mRNA levels 
tracked, with DNA copy number across all five classes, in a 
statistically significant fashion (£ values for pair-wise Student's 
/ tests comparingadiacent classes: cell lines, 4 x 1 x 10 -49 , * 
5 X iff;* 1 X hH; tumors, 1 X lO" 4 *, 1*X lfr*" 5 X 10-". 
1 X 1CH). A linear regression of the average log(DNA copy 
number), for each class, against average log(mRNA level) 
demonstrated that on average, a Mold change in DNA copy 
number was accompanied by 1,4- and 1 5-fold changes in mRNA ■ 
level for the breast cancer cell lines and tumois, respectively (Fig. 
4a, regression line not shown). Second, we characterized the 
distribution of the 6,095 correlations between DNA copy num- 
ber and mRNA level, each across the 37 tumor samples (Fig. 46). 
The distribution of correlations forms a normal-shaped curve, 
but with the peak markedly shifted in the positive direction from 
zero. This shift is statistically significant, as evidenced in a plot 
of observed vs. expected correlations (Fig, 4c), and reflects a 
pervasive global influence of DNA copy number alterations on 
gene expression. Notably, the highest correlations between DNA 
copy number and mRNA level (the right tail of the distribution 
in Fig. 46) comprise both' amplified and deleted genes (data not 
shown). Third, we used a linear regression model to estimate the 
fraction of all variation measured in mRNA levels among the 37 
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Fig, 4. Genome-wide fnftuence of DNA copy number alterations on mRNA levels, (a) For breast cancer cell Unes (gray) and tumor samples (black), both 
mean-centered mRNA fluorescence ratio 0og 2 scale) quartifes (box plots Indicate 25th, 50th, and 75th percentile) and averages (diamonds," V-vatue error bars 
Indicate standard errors of the mean) are plotted for each of five classes of genes, representing ONA deletion (tumor/normal ratio <OB), no change {0.8-1.2). 
low- (1.2-7). medium- (2-4). and high-level <>4) amplification. R values for pair-wise Students t tests, comparing averages between adjacent classes (moving 
tefttoright) # are4x 10-* I x 10^5X10"*. 1 x 10"* (cell lines), and 1 X 10-« 1 X 10~™ 5X 10"*M x tO^(ftm»rs).(b)Di$trll>utlonofcojiebtk>itt 
DMA copy number and mRNA levels. for6,095 different human genes across 37 breasttumor samples, (c) Plot of observed versus expected correlation coefficients. 
The expected values were obtained by randomization of the sample labels In the DNA copy number data set The line of unity Is Indicated, (d) Percent variance 
In gene expression (among tumors) directly explained by variation in gene copy number. Percent variance explained, (black line) and fraction of data retained 
(gray line) are plotted for different fluorescence intensity/background (a rough surrogate for signal/no Isa) cutoff values, fraction of data retained Is relative 
to the 1 2. brtemfty/backg round cutoff. Details of the linear regression model used to estimate the fraction of variation In gene expression attributable to 
underiy tag DNA copy number alteration can be found in the supporting Information (see Bstima ting the Fraction of Variation fa Gene Expression Attributable 
to Underlying DNA Copy Number Alteration}. 



tumors that could be attributed to underlying variation in DNA 
copy number. From this analysis, wc estimate that, overall, about 
7% of all of the observed variation in mRNA levels can be 
explained directly by variation in copy number of the altered 
genes (Fig. Ad). We can reduce the effects of experimental 
measurement error on this estimate bY_usu*gpnIy that fraction 
of. the data most "reliably measured (fluorescence intensity/ 
background >3); using that data, our estimate of the percent 
variation in mRNA revels directly attributed to variation in gene 
copy number increases to 12% (Fig. 4$. This still. undoubtedly 
represents a significant underestimate, as the observed variation 
in global gene expression is affected not only by true variation in 
the expression programs of the tumor cells themselves, but also 
by the variable presence of non-tumor cell types within clinical 
samples. 

Discussion 

This genome-wide, array OGH analysis of DNA copy number 
alteration in a series of human breast tumors demonstrates the 
usefulness of defining amplieon boundaries at high resolution 
(gene-by-gene), and quantitatively measuring amplieon shape, to 
assist in locating and identifying candidate oncogenes. By ana- 
lyzing mRNA levels in parallel, wc have also discovered that 
changes in DNA copy number have a large, pervasive, direct 
effect on global gene expression patterns in both breast cancer 



cell lines and tumors. Although the DNAmicroarrays used in our 
analysts may display a bias toward characterized and/or highly 
expressed genes, because we are examining such a large fraction 
of the genome (approximately 20% of all human genes), and 
because, as detailed above, we are likely underestimating the 
contribution of DNA copy number changes to altered gene 
expression, we believe our findings are likely to be generalizable 
(but Would nevertheless still be remarkable if only applicable to 
this set of -6,100 genes). 

In budding yeast, aneuploidy has been shown to result in 
chromosome-wide gene expression biases (13). Two recent 
studies have begun to examine the global relationship between 
DNA copy number and gene expression in cancer cells. In 
agreement with our findings, Phillips et aL (14) have shown that 
with the acquisition of tumorigenicity in an immortalized pros- 
tate epithelial cell line, new chromosomal gains and losses 
resulted in a statistically significant respective Increase and 
decrease in the average expression level of involved genes. In 
contrast, Platzer ct al (15) recently reported that in metastatic 
colon, tumors oruy —4% of genes within amplified regions were 
found more highly (>2-fold) expressed, when compared with 
normal colonic epithelium. This report differs substantially from 
our finding that 62% of highly amplified genes in breast cancer 
exhibit at least 2-fold increased expression. These contrasting 
findings may reflect methodological differences between the 
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studies. For example, the study of Platzer et al (IS) may have 
systematically under-measured gene expression changes. In this 
regard it Is remarkable that only 14 transcripts of many thousand 
residing within unampHfied chromosomal regions were found to 
exhibit .at least 4-fold altered expression in metastatic colon 
cancer. Additionally* their reliance on lower-resolution chromo- 
somal CGH may have resulted in poorly delimiting the bound- 
aries of high-complexity ampl Icons, effectively overcalling re- 
gions with amplification. Alternatively, the contrasting findings 
for amplified genes may. represent real biological differences 
between breast and metastatic colon tumors; resolution of this 
issue will require further studies. 

Our rinding that widespread DNA copy number alteration has 
a large, pervasive and direct effect on global gene expression 
patterns in breast cancer has several important implications. 
First, this finding supports a high degree of copy number- 
dependent gene expression in tumors. Second, It suggests that 
most genes are not subject to specific autoreguiation or dosage 
compensation. Third, this finding cautions that elevated expres- 
sion of an amplified gene cannot alone be considered strong 
independent evidence of a candidate oncogene's role in tumor* 
igenesis. In our study, fully 62% of highly amplified genes 
demonstrated moderately or highly elevated expression. This 
highlights the importance of bigh-resolutionnSpping of amplW 
con boundaries and shape (to identify the "driving" gene(s) 
within amplicons (16)], on a large number of samples; ui addition 
to functional studies. Fourth, this finding suggests that analyzing 
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the genomic distribution of expressed genes, even within existing 
oucroarray gene expression data sets, may permit the inference 
of DNA copy number aberration, particularly aneuploidy (where 
gene expression can be averaged across large chromosomal 
regions; see Fig. 3 and supporting information). Fifth, this 
finding implies that a substantial portion of the phenotypic 
uniqueness (and by extension, the heterogeneity in clinical 
behavior) among patients' tumors may be traceable to underly- 
ing variation in DNA copy number. Sixth, this finding supports 
a possible role for widespread DNA copy number alteration in 
tumorigencsis.(17, 18), beyond the amplification of specific 
oncogenes. and deletion of specific tumor suppressor genes, 
Widespread DNA copy number alteration, and the concomitant 
widespread imbalance in gene expression, might disrupt critical 
stochfometric relationships in cell metabolism and physiology 
(e.gn proteosome, mitotic spindle), possibly promoting further 
chromosomal instability and directly contributing to tumor 
development or progression. Finally, our findings suggest the 
possibility of cancer therapies that exploit specific or global, 
imbalances in gene expression in cancer. 
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